180 research outputs found
Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance
We show how both the tridiagonal and bidiagonal QR algorithms can be restructured so that they be-
come rich in operations that can achieve near-peak performance on a modern processor. The key is a
novel, cache-friendly algorithm for applying multiple sets of Givens rotations to the eigenvector/singular
vector matrix. This algorithm is then implemented with optimizations that (1) leverage vector instruction
units to increase floating-point throughput, and (2) fuse multiple rotations to decrease the total number of
memory operations. We demonstrate the merits of these new QR algorithms for computing the Hermitian
eigenvalue decomposition (EVD) and singular value decomposition (SVD) of dense matrices when all eigen-
vectors/singular vectors are computed. The approach yields vastly improved performance relative to the
traditional QR algorithms for these problems and is competitive with two commonly used alternatives—
Cuppen’s Divide and Conquer algorithm and the Method of Multiple Relatively Robust Representations—
while inheriting the more modest O(n) workspace requirements of the original QR algorithms. Since the
computations performed by the restructured algorithms remain essentially identical to those performed by
the original methods, robust numerical properties are preserved
Householder QR Factorization With Randomization for Column Pivoting (HQRRP)
A fundamental problem when adding column pivoting to the Householder QR fac-
torization is that only about half of the computation can be cast in terms of high performing matrix-
matrix multiplications, which greatly limits the bene ts that can be derived from so-called blocking
of algorithms. This paper describes a technique for selecting groups of pivot vectors by means of
randomized projections. It is demonstrated that the asymptotic
op count for the proposed method
is 2mn2 �����(2=3)n3 for an m n matrix, identical to that of the best classical unblocked Householder
QR factorization algorithm (with or without pivoting). Experiments demonstrate acceleration in
speed of close to an order of magnitude relative to the geqp3 function in LAPACK, when executed
on a modern CPU with multiple cores. Further, experiments demonstrate that the quality of the
randomized pivot selection strategy is roughly the same as that of classical column pivoting. The
described algorithm is made available under Open Source license and can be used with LAPACK or
libflame
- …